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[57] ABSTRACT 

A computer system, computer program product, and method 
for visually mapping data between different record formats 
provides for the mapping of source data fields of a dump file 
to the target data format fields of a digital library using an 
interactive mapping section output map. The output map 
includes a grid with cells to indicate crossings between the 
source data fields and the target data format fields, and the 
user indicates such crossings without resort to a custom 
applications loader thus providing dynamic data mapping at 
execution time. 
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METHOD FOR VISUALLY MAPPING DATA integer datatypes. The small integer datatype is so named 

BETWEEN DIFFERENT RECORD FORMATS because it conventionally is limited in length to half of a 

word. The large integer datatype, by contrast, may be 

BACKGROUND OF THE INVENTION allocated two words. 

v m f h t 5 Other traditional datatypes include packed decimal, float - 

1. Meld ot toe Invention mg point> lengm characterj and variable length char- 

This invention relates generally to databases. More acter datatypes. As is the case with the integer datatype, 

particularly, this invention relates to a method for loading variations exist with respect to the other datatypes. Some 

data into a database. special purpose variations of the traditional datatypes 

2. Background and Related Art 10 mcIude logical, money, date, and time. 

Every database management system is based on a general RDBMS's recently have been improved to provide sup- 
database model. The following are examples of well-known P ort ako f ? r ™ntraditional datatypes. Some supported 
database models: the hierarchical model, the network model, J?* 8 ™ lu * e ^ **WFf*> **S C <*J«f 
and the relational model. A database management system CLOBs), and * n6l0 A ln °* er words - a * n of ^ relational table 
based on the relational model may be referred to as a is ma * now i contain f data ^ » «f a^to segment a 
relational database management system (RDBMS). An n^rprmUcxt of great length (such as .. book) or an aumo 
RDBMS is a system of computer programs that facilitates ^ mQnt ^ ?*T* of a < elatl ° nal table now may 
^ 4 j . i r „i *• i have nontraditional datatypes as their respective datatypes, 
the creation, management, and manipulation of relational . . , . « ^ 
databases Other nontraditional datatypes either presently are or soon 

_„ will be supported. Examples of other nontraditional 

Every relational database is based on the :re ahonal model. 20 da( are spreadsheets, lists, and tables, to name but a 

The relational model is familiar to one of skill in the art The f ew 

book "An Introduction to Database System^ by C. J Date App^^ programs access the data of re i ational tables 

(Addoon i Wesley Pubhshing company) provides an in-depth b makin f daUbase ^ Used ^ Ais the 

guide to toe relational model, and hereby >s unrated in ^ „ ^tiorx programs" may refer to several separate 

its entirety by reference. An example of an RDBMS is DB2, 25 rr r © t / r 

... * J . . . . Y , _ . , „ . programs, only one program, a module of a program, or even 

which commercially is available through International Busi- a ^ of a moduJc . ^ &pp \™Mous program may 

ness Machines Corporation. u u i- a i- »■ 

r . be written by an applications programmer. Applications 

According to the relational model, data is perceived to programmers develop applications programs using any of a 
exist as a collection of relational tables. A relational table number of programming languages. During development 
expresses a relation between things. Relational tables are and fe si&1 of applications programs, applications program- 
characterized by rows and columns. Although the rows and mer s may adhere to a programming methodology. A pro- 
columns of relational tables may be employed in many gamming methodology is a set of principles by which 
ways, the relational model provides that columns pertain to analysis is performed and by which design decisions are 
entities or attributes of entities, and that rows pertain to made Programming methodologies may be referred to as 
specific instances of entities or specific instances of programming paradigms. Examples of widely-known pro- 
attributes of an entity. gramming paradigms include the top-down, the data-driven, 

The rows and columns of a relational tables intersect to ant j the object oriented (00) programming paradigms, 

define data cells. In this discussion, the term record may be Turning now to consider the data, instead of the database, 

used to refer to a row; the terms attribute and field may be ^ i t may DC observed that information in many organizations 

used to refer to a column. is held in digital form in repositories which are not part of 

Although the structure of the relational model provides the same data library, the same computing systems or even 
for tables, rows, columns, and cells, a certain hierarchy may the same administrative domain. This has hampered access 
be observed within the model That is, a relational database to the information held in those separate repositories, even 
comprises one or more tables; each table comprises one or 4S though the information held separately may be related For 
more rows; each row comprises one or more cells. Thus, the example, an organization may have information residing in 
relational model defines four adjacent layers of hierarchy: completely different data processing systems. These differ- 
databases, tables, rows, and cells. The tables layer is the next ent data processing systems may be in place as a result of 
higher layer of the rows layer. The cells layer is the next combining previous projects, or because of mergers or 
lower layer of the rows layct. The tables layer is adjacent the 50 acquisitions of companies having different data processing 
rows layer, but is not adjacent the cells layer. Moreover, a systems. It is a common occurrence that valuable data 
given table may be referred to as an instance of the table resides and is used in separate and distinct libraries, corn- 
layer, a given row as an instance of the row layer, and so on. puting systems or administrative domains. 

Although the relational terminology of tables, rows, A problem many such organizations face is that informa- 

columns, and cells is used throughout this description, one 55 tion held in such heterogeneous data stores may, in the 

of skill in the art will appreciate that the concepts presented minds of people within the organization, be related concep- 

herein may be applied outside of the relational model to tually. Such data, however, remains unrelated at a data 

great advantage. In particular, the concepts are applicable in processing level. In other words, the information in one 

any database environment in which the data model similarly database is not accessible along with the information in 

includes a hierarchy of adjacent layers. ^ another database. Hence, that information can be difficult to 

Each column of a relational table has a respective handle, and the full value of it unrealized until the unrelated 

datatype. The datatype of a column restricts the values data is joined. Collected into carefully managed records, 

which the cells of that column may be. For instance, a such information is at the core of what it means to have a 

traditional datatype for a column of a relational table is the library. If the collection is held in digital format, it is known 

integer datatype. If a column has the integer datatype, the 65 as a digital library. 

cells of that column may have only integer values. Variations A digital library as described in U.S. Pat. No. 5,649,185 

on the integer datatype include the small and the large to Antogmni et al., which is incorporated herein by refer- 
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ence. A digital library uses a database, but also allows 
application programs, residing on a library client, to interact 
with the underlying digital library services and hence the 
underlying database, to store and retrieve information. 

One way to add information to a digital library is to 5 
incorporate the source information from wherever it occurs 
into this specialized repository. This way of adding infor- 
mation is the primary subject of the invention. 

For the sake of clarity, certain terms will now be dis- 
cussed. The term target digital library means a digital library 10 
or a database that a user is using or desires to use. The target 
digital library requires data to be in one of a plurality of 
target data formats. The target digital library typically has 
many target formats, and might have one target format for 
each table defined within it. 15 

The term unusable data, or source data, refers to data that 
is stored in a form not directly useable by the target digital 
library because it is in a form thai does not match one of the 
plurality of target data formats. Source data is typically 
available from a source database or a source data store (i.e., 20 
a magnetic tape, disk, or the like). The unusable data is said 
to be in a source form, to have a source format, or to have 
a source data format. To be usable to the target digital 
library, the source data may be converted from the source 
data format to one of the plurality of target formats of the 25 
target digital library. 

In loading data into a target digital library, a preliminary 
step is usually to create a dump file. A dump file is often 
produced by an ASCII dump of the source data from the ^ 
source database. It will be understood that an ASCII dump 
is a feature commonly available in nearly every database 
management system, and in nearly every computer system. 
For example, data preserved on reels of tape may commonly 
be dumped to a dump file in ASCII. It will be appreciated 35 
that ASCII is here used merely as an example, and that 
EBCDIC or any other manner of representing data may 
instead be used. It also will be understood that a dump file 
need not necessarily be a file stored on a disk, but may 
include a stream of electronic impulses which are generated ^ 
and provided to a process without any intermediate storage 
per se of the data. 

A dump file can be of many different formats vis-a-vis 
how the data is logically separated. In one example of a 
dump file format, records are separated by one or more 45 
separator characters. In another example of a dump file 
format, there is one record per line. In yet another example 
of a dump file format, there are multiple records per line. 
Likewise, fields may be distinguished one from another by 
separator characters, lines, or the like, and may be fixed or 50 
variable in length. 

In the target digital library, there are a plurality of target 
data formats. This plurality of target data formats may 
number in the hundreds. For the sake of clarity, the target 
data format that the source data must be converted into shall 55 
be referred to as a desired target data format. The selection 
of the desired target data format will depend on how, 
logically, the source data is to be included in the target 
digital library. A term which may be used interchangeably 
with target data format is the term index class. A digital so 
library thus may be said to include a plurality of index 
classes. 

One approach to working with source data in a source data 
format that is not one of a plurality of target data formats is 
to write a custom loader application. In other words, to load 65 
the data from a dump file, an application programmer writes 
a custom loader application. Such a custom loader applica- 
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tion must understand the format of the dump file, must read 
the fields from the dump file, and then must assign the right 
value from the dump file to that of the desired target data 
format corresponding to the desired data structure. This 
assignment must be based on knowledge of the record 
structure of the target digital library. 

A problem with the use of custom loader applications is 
that there are so many different formats possible for the 
source data, there are typically many input files of source 
data all in different source data formats, and there are many 
different target data formats The problem, more particularly, 
is that many custom loader applications must be written. The 
writing of custom loader applications may be time- 
consuming, and such applications often are non-reusable. 

Another approach to working with source data which is in 
a source data format that is not one of a plurality of target 
data formats is described in U.S. Pat. No. 5,421,001 to 
Methe. Methe suggests an improved method of writing 
custom loader programs. According to Methe, there must be 
provided a common interface between all of the multiple 
foreign file formats (i.e., the source data format and the 
plurality of target data formats). This common interface is to 
be achieved by translating the elements of the source data 
format and the plurality of target data formats (which must 
be known a priori) into what amounts to a third, common 
format. The Methe approach allows an application program- 
mer to use this common interface and common format for 
reading and writing in the multiple foreign file formats. In 
other words, the Methe approach applied to the problem of 
creating a suitable loader program is to write the software so 
as to translate the source data format and the plurality of 
target data formats into a predetermined common format 
upon opening the dump file, to write statements that manipu- 
late the fields of the records in this common format, and then 
write statements that translate the data from this common 
format into the desired target data format(s) for writing into 
the target digital library. 

The Methe approach allows an application programmer to 
reduce development time by being less concerned about 
differing file formats. The application programmer can be 
less concerned about differing file formats because he can 
write the data manipulation statements with the predeter- 
mined common format in mind. Although the use of a 
predetermined common format thus may be advantageous 
over the approach of writing a custom loader application 
from scratch, the approach is not without its shortfalls. 

One problem with the Methe approach is that the appli- 
cation developer must decide what component or compo- 
nents of the source data in the source data format are to be 
read as he writes the loader program. Likewise, the appli- 
cation programmer must also decide the locations or loca- 
tions of the target digital library (and, correspondingly, the 
desired target data format or formats) to which the source 
data, after conversion to the common format, is to be 
written. These data correspondence decisions thus are stati- 
cally bound upon the compilation of the program. Thus, 
adopting the Methe approach makes it impossible to alter 
this decision without rewriting the loader program. 

The custom loader application approach and the Methe 
approach both suffer from the drawback that the data cor- 
respondence decisions are coded into the loader applica- 
tions. 

SUMMARY OF THE INVENTION 

It is an object of this invention to overcome the deficien- 
cies and shortcomings mentioned above. In particular, it is 
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an important object of this invention to set forth a method, computer, cause the computer to act in accordance with the 
a program product, and a computer system for dynamically particular content of the statements. Furthermore, the soft- 
control the mapping of selected data elements of source data ware that enables a computer system to act in accordance 
in a source data format to particular data elements in a with the invention may be provided in any number of forms 
desired target data format selected from a plurality of target 5 including, but not limited to, original source code, assembly 
data formats. code, object code, machine language, compressed or 
In brief, this invention combines three elements to load encrypted versions of the foregoing, and any and all equiva- 
data from files of varying data formats. More particularly, l ents - 

one of the three elements is an index class which specifies One of skill in the art will appreciate that "media", or 
the data that the target digital library application expects. A 10 "computer-readable media", as used here, may include a 

second of me three elements is a data file parser to parse the diskette, a tape, a compact disc, an integrated circuit, a 

dump file. The third of the three elements is a mapper to cartridge, a remote transmission via a communications 

specify the mapping of the fields from the dump file into the circuit, or any other similar medium useable by computers, 

index class (i.e., from the source data in the source data For example, to supply software for enabling a computer 

format into the desired target data format of the plurality of 15 system to operate in accordance with the invention, the 

target data formats). The mapper is an important aspect of supplier might provide a diskette or might transmit the 

the invention, and allows users visually and dynamically to software in some form via satellite transmission, via a direct 

map which field from the dump file is to be placed in which telephone link, or via the Internet, 

field of the index class. Although the enabling software might be "written on" a 

20 diskette, "stored in" an integrated circuit, or "carried over" 

BRIEF DESCRIPTION OF THE DRAWINGS a rommunicat i ons circuit , it will be appreciated that, for the 

FIG. 1 shows a conceptual view of source data and a purposes of this application, the computer usable medium 

target digital library. will be referred to as "bearing" the software. Thus, the term 

FIG. 2 illustrates'a plurality of target data formats of a ^ " be ? ril tf' is intended to encompass the above and all 

target digital library. equivalent ways in which software is associated with a 

~* !~ t . t „ computer usable medium. 

FIG. 3 shows how the plurality of target data formats ~ , r . f . « - 

t * * « , A .« * . j- i ii_ For the sake of simplicity, therefore, the term program 

relate to data tables m a target digital library. j. *» • *l j * r \ \ ui 

& ^ J product is thus used to refer to a computer useable medium, 

FIG. 4 illustrates an example of a desired target data ^ defincd m any form of software to 

format. cna ^j c a computer system to operate according to the 

FIG. 5 depicts a data table having records with attributes invention. Thus, the invention is also embodied in a program 

as specified by the desired target data format of FIG. 4. product that includes a computer readable medium bearing 

FIG. 6 shows source data in an exemplary source data software which enables a computer to perform operations 

format. 3S according to the invention. 

FIG. 7 shows, in schematic, an embodiment of the inven- The invention is intended to be construed not only with 

uon. respect to the example described below, but with respect to 

FIG.8showsaflowcbartrelatmgtoanembodimentofthe an V and aU equivalents in accordance with the appended 

invention. claims - 

FIG. 9 shows an exemplary embodiment of the mapping 40 HCL1 shows source data 100 and a target digital library 

section output map according to the invention. 20 ° ^ particular symbols are used for explanation only, 

, , „. . and one knowledgeable m the art will appreciate that the 

FIG. 10 shows an exemplary embcxhment of the mapping B0U|ce data 100 md me ^ t Ub 2 00 may reside 

section output map according to the invention at a later 0Q ^ readable medium . Furthermore, the target 

sta 8 e - 45 digital library 200 is shown with a symbol representing data 

FIG. 11 depicts the data table of FIG. 5 after application storage, but it will be understood that the target digital 

of the invention to the source data of FIG. 6. library 200 includes also a set of application programs and 

DETAILED DESCRIPTION OF THE 3 ^TJ^fT^ ^ ^n* ^ .T^ 

PREFERRED EMBODIMENT <^tb*t£thoiigh Ita source daU 100 and the target digital 

50 library 200 arc shown as being held in one location, both 

The presently preferred embodiment of the invention will may actually be distributed across different platforms and 

be explained with reference to the above-identified figures. even locations. That is, although the target digital Library 200 

Prior to such an explanation, however, certain terms will be may include data held in geographically distant locations, 

explained. the target digital library 200 may conceptually be under- 

Although the description will focus on teaching the inven- 55 stood as being a single entity, 

tion as a series of steps in a method, it will be appreciated FIG. 2 shows target digital library 200. Included in target 

that the invention may be embodied in a computer system digital library 200 are a plurality of target data formats 

that contains hardware and software enabling it to perform indicated generally at 300. Each of the plurality of target 

the described operations. Similarly, the invention may be data formats may be different, although there is no require- 

embodied in a computer program product 50 ment that they be different. In actual situations, the tables of 

On a practical level, the software that enables the com- a digital library typically have different formats, 

puter system to perform the above-identified approach and FIG. 3 shows three of the plurality of target data formats 

operations of the invention is supplied on any one of a 300 of target digital library 200. In particular, one target data 

variety of media. Furthermore, the actual implementation of format 310 describes the fields, or attributes, of the records 

the approach and operations of the invention may be actually 65 of data table 410. Another target data format 320 describes 

statements written in a programming language. Such pro- the attributes of table 420 of the target digital library 200. 

gramming language statements, when executed by a Yet another target data format 330 describes the attributes of 
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table 430. The correspondence between the illustrated target implemented as a file written in memory. The exact imple- 
data formats and certain illustrated ones of the data tables of mentation is not critical to the invention. It will also be 

target digital library 200 is shown with broken lines with understood that dump file parse section 500 may be a 

arrowheads at each end, process active on a computer system or, possibly, specialized 

FIG. 4 shows, in more detail, information concerning 5 hardware designed for a particular kind of source data 100. 

exemplary target data format 330. In particular, target data Dump file parse section 500 analyzes dump file input 

format 330 describes a table which has nine attributes for signal 150 to determine certain information concerning 

each record. This target data format 330 may be named, e.g., dump file 110. In particular, dump file parse section 500 

Employee_Info. To put it another way, there is an index determines the number of fields for each record. This is 

class named Employ ee_Info which describes a table with accomplished through analyzing the patterns of certain 

nine particular attributes for each record. commonly used separators such as semicolons, tab 

FIG. 5 shows an example of a table 430 which is based on characters, new line indicators, spaces, commas, and the 

the target data format or index class Employee_Info 330. Hke. Dump file parse section 500 optionally confirms the 

More particularly, FIG. 5 shows two data records in table correct parsing of dump file input signal 150 representing 

430. The columns of the table, except for the first blank 15 dump file 110 by interacting with the user. Further details 

column, correspond to those named in target data format concerning the operation of dump file parse section 500 are 

330. Each record, in other words, relates to an employee and omitted in view of the well known status of such parsing 

stores nine attributes of information about the employee. operations in this field. 

Each attribute has a particular datatype as shown in FIG. 4. Dump file parse section 500 outputs a dump file parse 

For purposes of illustration, the values in some of the fields 20 signal 550 to mapping section 700. Dump file parse signal 

have been shortened. For example, the ID shown in table 550 includes information relating to the number of fields in 

430 is only 5 integer positions long even though the ID field the source data format of dump file 110. Optionally, dump 

defined for the Employee_Info index class 330 is 10 integer file parse signal 550 further includes information relating to 

positions long. It will be appreciated that this has been done the number of rows or records, the general datatype of each 

for illustration purposes only. 25 field, and sample content for display. Further, dump file 

The table 430 may be named Retired_Employee_List, parse signal 550 optionally includes information internal to 

for example. As is apparent to one knowledgeable in this dump file 110 that indicates the names of the fields in the 

field, the Employee_Jnfo index class 330 may be used for source data format. Such information may be referred to as 

many different tables relating to employees. header information. In this simplified example,- dump file 

FIG. 6 shows source data 100 in more detail. In particular, 30 parse section 500 outputs only a number of fields as dump 

source data 100 includes dump file 110. Dump file 110 has file parse signal 550. It will be understood that dump file 

several records, two of which are shown in their entirety. parse signal 550 may be . represented by a stream of electric 

The symbol J indicates a new line indicator. In this impulses in a well known manner or as a file written in 

instance, the dump file has five fields, each separated by a 3S memorv as already mentioned above with respect to the 

semicolon character, with one record per line. The first field me ^ 150 

is the employee serial number, the second field is the first Index class selection section 600 uses, as input, the 

name, the third is the last name, the fourth is the department, plurality of target data formats 300 of the target digital 

and the fifth is the location. For this example, it may be library 200. To put it more concretely, index class selection 

assumed that the source data relates to employees retired in ^ section 600 may receive from target digital library 200 an 

the past. Presently, it is desired to add this information to the index class selection input 350 which contains, in electronic 

target digital library 200. Clearly, the source data format of form, information relating to a desired target data format 

five fields is different from the target data format of nine 330- More particularly, index class selection section 600 

fields. In terms of the drawing figures, the source data format may be a process which allows a user to select a desired 

of the dump file 110 does not match the index class 45 target data format 330 from the plurality of target data 

Employee_Info 330 used for the Retired_Employee_List formats 300 and which extracts the target data format 330 

430. from the target digital library 200. As an output, index class 

Since the source data format is not the desired target data selection section 600 produces index class selection signal 

format, a data conversion must be performed. As already 650. Index class selection signal 650 is provided to mapping 

mentioned, one approach would be to write a custom loader ^ section 700 and includes, typically, at least the names of the 

application to perform this conversion. Another approach attributes for each record of the selected index class (i.e., the 

would be to write a loader application which converts the of desired target data format 330). Optionally, index 

source data format of dump file 110 and also the target data class selection signal 650 may also include datatype infor- 

format 330 of table 430 into a common form, and then to set mation and even sample data from data table 430. 

the data correspondences using this common format, and 55 It will be understood that the order of execution between 

then (after the necessary copying) to return the data into the dump file parse section 500 and index class selection section 

target data format 330 of table 430. 600 is immaterial. Either one may precede the other, or both 

The approach of the invention may be understood with may be executed in parallel. Further, it will be appreciated 

reference to FIGS. 7 and 8. FIG. 7 shows a schematic that both sections may be processes or objects running on the 

diagram, and FIG. 8 shows a flowchart according to the 60 ssmc or different computing platforms, 

invention. After receiving the dump file parse signal 550 and the 

In FIG. 7 dump file 110 of source data 100 is provided as index class selection signal 650, mapping section 700 

a dump file input signal 150 to dump file parse section 500. dynamically produces a mapping section output map 750 for 

It will be understood that dump file input signal 150 may be display on visual display unit 800. An example of mapping 

represented by a stream of electronic impulses in a manner 65 section output map 750 is shown in FIG. 9. In particular, 

well known. Furthermore, dump file input signal 150 or any mapping section output map 750 represents a table or grid by 

of the other items described as signals below may also be which each of the plurality of fields identified in the dump 
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file parse signal 550 is crossed with each of the attributes 
identified in the index class selection signal 650. The 
embodiment shown in FIG. 9 shows a grid, although the 
precise format is not essential. In other words, the rows and 
columns may be interchanged, and the headings may be put 
in any order. 

Another way to describe the grid in a more generally is 
that it is a set of cells, each of which is arranged with respect 
to a first direction and a second direction, the first and second 
directions being orthogonal. Thus, the first direction may 
refer either to rows or columns. The second direction may 
thus refer to rows when the first direction refers to columns, 
or to columns when the first direction refers to rows. For 
simplicity, the rows and columns terminology will primarily 
be used in this description. 

In detail, the grid of FIG. 9 shows, in the first row 760, a 
blank in the first column 770 followed by each of the fields 
identified in the dump file parse signal 550. It will be 
recalled that the dump file parse signal 550 included at least 
the number of fields. By this, it is meant that the dump file 
parse signal 550 may actually contain an integer, such as the 
integer 5. In this case, mapping section 700 would be 
adapted to create five field names as shown in FIG. 9 such 
as Fieldl, Field2, . . . Field5, for example. Another way in 
which the dump file parse signal 550 may indicate the 
number of fields is to enumerate them. In this alternative, the 
mapping section 700 would be adapted to use the names 
provided as the field names in the first row 760 of the grid, 
and would further be adapted to count the names provided 
in the dump file parse signal 550 to determine how many 
fields are included. Either way, the dump file parse signal 
550 may be said to include the number of fields by its 
content 

In FIG. 9, the grid contains, in the first column 770, a 
blank in the first row 760 followed by each of the attributes 
identified in the index class selection signal 650. The rows 
and columns of the grid, except for the first row and the first 
column, define cells 780 of the grid. Each cell 780 represents 
the possible cross between one of the plurality of dump file 
fields in the first row 760 and a corresponding one of the 
plurality of index class attributes (it will be recalled that 
index class attributes may be referred to also as target data 
format fields) in the first column 770. For clarity, only a few 
of the cells 780 are indicated by lead lines in FIG. 9. 

The grid provides; in each of its cells 780, a means (not 
shown) for the user of the program dynamically to indicate 
a pairing between a selected one of the dump file fields and 
a selected one of the index class attributes. Such a means 
may include, but is not limited to, a checkbox field or a text 
entry field. In the embodiment shown in FIG. 10, the user 
enters a letter X to indicate the desired crossing between a 
selected dump file field and a selected index class attribute. 
The mapping section output map 750 in the described 
embodiment thus comprises a grid having cells 780 which 
each include a field for indicating a desired crossing. To 
generalize, it may be said that mapping section output map 
750 is a means for indicating crossings between the plurality 
of dump file fields and the plurality of desired target data 
format fields. To put it another way, the mapping section 700 
may itself be understood as a means for determining cross- 
ings between the plurality of dump file fields and the 
plurality of desired target data format fields. 

likewise, using the particular terminology of the digital 
library, the mapping section output map 750 is a means for 
indicating crossings between the plurality of dump file fields 
and the attributes of the selected index class; the mapping 
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section 700 itself provides a means for determining cross- 
ings between the plurality of dump file fields and the 
attributes of the selected index class. 

It is an important aspect of this invention that the mapping 

5 section 700 dynamically operate to produce mapping section 
output map 750. Thus, after the crossings have been deter- 
mined by the mapping section 700, this information is used 
to map the Fieldl of the dump file to attribute ID of the 
Employee„Infb index class 330, Field2 to attribute First 

10 Name of the Employee_Info index class 330, and so on. 
Given these mappings, it is straightforward to write the data 
from dump file 110 to data table 430 because data table 430 
is based on the Employee_Jnfo index class 330 and there- 
fore has, as its fields, the attributes defined in that index 

15 class. 

The result of appending the two rows completely shown 
in FIG. 6 to the table 430 shown in FIG. 5 is illustrated in 
FIG. U. In particular, FIG. 11 shows the two rows appended 
with the data from dump file U0 correctly inserted in table 

20 430. The values in the first column are not part of the data 
inserted, and are merely record numbers used for illustra- 
tion. In record numbers 3 and 4, it will be noted that the 
record attribute Middle Initial is blank. This is because the 
source data did not include such data. One of skill in this 

25 field, after reading this description, will readily understand 
that such fields may be given an initial value of Null, an 
empty string, or some other value in keeping with the design 
of the target digital library. Furthermore, the above example 
showed a simple case in which the source data format had 
fewer fields than the desired target data format The inven- 
tion will operate properly and with equal effect even in 
situations in which the number of source data format data 
fields exceeds the number in the desired target data format. 
In such a situation, the data fields in the source data format 

35 which are not available in the desired target data format will 
have no crossings indicated. 

It will further be appreciated that, regardless of the 
number of fields in the source data format and the desired 

^ target data format, there may sometimes be fields in the 
source data which are not needed in the desired target data 
format. In such cases, the proper course of action is to 
indicate no crossing with respect to the data fields of the 
source data format that are not needed. 

45 To prevent logical confusion, the mapping section output 
map 750 may be set so that the indication of a first crossing 
between a source data field and a desired target data format 
field prohibits any second crossings from being indicated for 
the row and column that include the first crossing. This 

5q prohibition may remain in force unless the first crossing is 
negated, at which point the row and column that formerly 
included that first crossing may now be crossed as desired 
(subject to the limitation that no other rows/columns with 
crossings already indicated may be crossed a second time). 

55 The invention will now be described with reference to 
FIG. 8, which shows a general flowchart useful in summa- 
rizing the above-identified operations. 

The operation begins, in this example, with the dump file 
parse section 500 executing step 502 and the index class 

60 selection section 600 executing step 602 in parallel. At step 
502, the dump file input signal 150 is requested from the 
computer system having source data 100 which includes 
dump file 110. After dump file parse section 500 receives 
dump file input signal 150, it analyzes the dump file in a well 

65 known manner in step 504. The results of this analysis are 
examined in step 506 to determine at least the number of 
fields occurring in each record. The dump file parse signal 
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550 is prepared and made available to the mapping section 
700 in step 508. 

Id the meantime, index class selection section 600 queries 
the target digital library to obtain a list of the available 
plurality of target data formats 300. The list is presented to * 
the user via VDT 800 in step 604 so that the user may 
indicate which one of the plurality of target data formats is 
the desired target data format. In other words, the user is 
solicited to select an index class. Step 606 determines 
whether the selection of an index class has occurred. If not, l° 
processing loops back to this step 606 (path n). If the 
selection of an index class has occurred, processing contin- 
ues to step 608 (path y). In step 608, the index class selection 
section 600 queries the target digital library 200 to obtain 
information concerning the desired target data format is 
(assumed to be target data format 330 in this example). In 
step 608, at least the target data format attribute names (i.e., 
the names of the Melds) are extracted from digital library 
200. 

In step 610, the foregoing information concerning desired 20 
target data format 330 is included in index class selection 
signal 650 and made available to mapping section 700. 

At the same time that the dump file parse section 500 and 
the index class selection section 600 began their processing, 
mapping section 700 began processing with step 702. In step 25 
702, the mapping section 700 began to wait for the dump file 
parse signal 550 and the index class selection signal 650 to 
be made available. At this step, the process continually 
check for the availability of these two signals. Unless both 
signals are available, processing loops back to this step 702 
(path n). When both signals are available, however, process- 
ing continues with step 704 (path y). 

In step 704, the mapping section 700 analyzes the dump 
file parse signal 550 and the index class selection signal 650 35 
to determine what to use for values in the first row 760 and 
the first column 770 of the grid in mapping section output 
map. 750. After step 704 is complete, the mapping section 
output map 750 is generated, including the cells 780 adapted 
to accept user indications of crossings between dump file ^ 
fields and desired target data format fields. Also, at step 706, 
the mapping section output map 750 is presented to the user 
via VDT 800 so that the user may map the dump file fields 
to the desired target data format fields as appropriate for the 
particular dump file being converted and for toe particular 4S 
data table 430 of the target digital library 200. 

After presentation of the mapping section output map 750 
at step 706, processing continues to step 708. At step 708, it 
is determined whether the user has indicated that the map- 
ping has been completed. If the mapping is not yet complete, 50 
processing loops back to this step 708 (path n). If the 
mapping is complete, however, processing continues to step 
710 (path y). 

At step 710, the crossings as indicated in cells 780 arc 
used to convert the dump file 110 of source data 100 to the ss 
desired target data format 330 for adding to data table 430 
of target digital library 200. The details of this step are 
omitted because, once the crossings have been determined, 
it is well within the skill of one familiar with this field to use 
the crossings to perform copying or moving of the data as 60 
desired. 

It will be appreciated that the foregoing preferred embodi- 
ment represents only one way to practice the invention. 
Although parallel processing has been used as an example, 
serial processing also would provide the same end result 65 
albeit perhaps slower. The flowchart of FIG. 8 has been 
described with respect to processes, but it will be understood 
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that objects may be instantiated with appropriate member 
functions to perform the processes. It will also be recognized 
that an overall control routine or object may be created so as 
to ensure the orderly performance of the different tasks. 

A grid has been shown containing cells used to indicate 
crossings. Although the rows and columns of the grid may 
be interchanged, the general format of a grid showing 
crossings is a very important aspect of the invention because 
of its perfect clarity and because the user can visually and 
dynamically indicate crossings. 

As a result of the invention, there may now be provided 
a computer system that executes a program according to the 
invention, the program providing for the visual indication of 
crossings in a dynamic manner. It is important to note that 
the program provides for such indication of crossings at 
execution time, without the need to modify the application 
program. Thus, the computer system may facilitate the 
inclusion of source data into a target digital library by the 
repeated use of the program according to the invention. Such 
repeated use decreases the burden on application program- 
mers involved in large scale data integration. 

There is claimed: 

1. A method of visually mapping data of different record 
formats, comprising: 

dumping source data in a source data format to provide a 
dump file of records each having a number of fields; 

parsing said dump file to provide a parse signal indicating 
said number of fields; 

determining, for a target data format, a set of target data 
format fields; 

generating a grid comprising cells each arranged with 
respect to a first direction and a second direction, said 
second direction being orthogonal to said first 
direction, said grid including first field names arranged 
along said first direction and second field names 
arranged along said second direction, wherein: 
said first field names are based on said number of fields 

indicated by said parse signal; and 
said second field names are based on said set of target 
data format fields; 

indicating in said cells crossings between said first field 
names and said second field names; and 

mapping said fields of said records of said dump file to 
said target data format fields based on said crossings. 

2. The method of visually mapping data of different 
record formats as set forth in claim 1, further comprising: 

selecting said target data format from a plurality of target 

data formats; and 
reading, from a digital library, said plurality of target data 

formats. 

3. The method of visually mapping data of different 
record formats as set forth in claim 1, further comprising 
said generating of said grid being performed as to appear on 
a visual display unit. 

4. A computer system for visually mapping data of 
different record formats, comprising: 

a processor, and 

a memory including software instructions adapted to 
enable the computer system to perform the steps of: 
dumping source data in a source data format to provide 

a dump file of records each having a number of 

fields; 

parsing said dump file to provide a parse signal indi- 
cating said number of fields; 

determining, for a target data format, a set of target data 
format fields; 
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generating a grid comprising cells each arranged with 
respect to a first direction and a second direction, 
said second direction being orthogonal to said first 
direction, said grid including first field names 
arranged along said first direction and second field 
names arranged along said second direction, 
wherein: 

said first field names are based on said number of 

fields indicated by said parse signal; and 
said second field names are based on said set of 
target data format fields; 
allowing a user to indicate in said cells crossings 
between said first field names and said second field 
names; and 

mapping said fields of said records of said dump file to 
said target data format fields based on said crossings. 

5. The computer system for visually mapping data of 
different record formats as set forth in claim 4, wherein said 
memory further comprises software instructions adapted to 
enable said computer system to perform the steps of: 

allowing a user to select said target data format from a 

plurality of target data formats; and 
reading, from a digital library, said plurality of target data 

formats. 

6. The computer system for visually mapping data of 
different record formats as set forth in claim 4, wherein said 
memory further comprises software instructions adapted to 
enable said computer system to perform said generating of 
said grid so as to appear on a visual display unit. 

7. A computer program product for enabling a computer 
to provide visual mapping data of different record formats, 
comprising: 

software instructions for enabling the computer to per- 
form predetermined operations, and 

a computer readable medium bearing the software instruc- 
tions; 

the predetermined operations including the steps of: 
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dumping source data in a source data format to provide 
a dump file of records each having a number of 
fields; 

parsing said dump file to provide a parse signal indi- 
cating said number of fields; 

determining, for a target data format, a set of target data 
format fields; 

generating a grid comprising cells each arranged with 
respect to a first direction and a second direction, 
said second direction being orthogonal to said first 
direction, said grid including first field names 
arranged along said first direction and second field 
names arranged along said second direction, 
wherein: 

said first field names are based on said number of 

fields indicated by said parse signal; and 
said second field names are based on said set of 
target data format fields; 
indicating in said cells crossings between said first field 

names and said second field names; and 
allowing a user to map said fields of said records of said 
dump file to said target data format fields based on 
said crossings, 

8. The computer program product for enabling a computer 
to provide visual mapping data of different record formats, 
as set forth in claim 7, wherein saiid predetermined opera- 
tions further comprise: 

allowing a user to select said target data format from a 

plurality of target data formats; and 
reading, from a digital library, said plurality of target data 

formats. 

9. Hie computer program product for enabling a computer 
to provide visual mapping data of different record formats, 
as set forth in claim 7, wherein said predetermined opera- 
tions further comprise generating said grid so as to appear on 
a visual display unit. 
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